Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
freelancer.com 🟡 2026-05-31
🔹 ReJust PDF Corpus Archive
👤 Client: Iasi, Romania Member since 2005-04-14
💰 Price: $6127 Average bid
🚩 Problem: Requirement for a complete, verifiable archive of 54 million PDFs from rejust.ro with strict naming conventions.
📦 Existing: Not specified
Specifications:
[Target] rejust.ro
[Method] Automated crawling with throttling, retries, and auto-requeueing
[Format] PDF
[Format] URL to saved path mapping log/database
[UI/UX] Filenames based on source page titles (UTF-8)
[UI/UX] Logical directory hierarchy (e.g., first-letter buckets)
[Stack] Python, Scrapy, Playwright
[Stack] Cloud Storage (AWS S3, GCS)
Workflow:
1. Proof-of-concept extraction of 500 PDFs.
2. Full-scale crawl of all accessible pages.
3. PDF generation and naming via page titles.
4. Storage in cloud buckets with directory categorization.
5. Generation of URL-to-path verification log.